A practical Message-to-Speech strategy for dialogue systems

نویسندگان

  • Peter Spyns
  • Filip Deprez
  • Luc Van Tichelen
  • Bert Van Coile
چکیده

In this paper, we present a Message-toSpeech system for Natural Language Generation that is to be integrated in a dialogue system. As the system has to function in a very restrictive environment with respect to computational resources, a compromise between concept based and template based generation systems had to be found. Still, the approach aims at achieving linguistic flexibility for the utterances and attaining a natural sounding prosody. 1 I n t r o d u c t i o n Many of the Natural Language Generation (NLG) systems that produce flexible output, i.e. sentences with variations on the syntactical and morphological levels, only aim at the production of written text and do not deal with spoken language. By doing so, the important topic of generation of natural prosody is not touched upon (see e.g. (Elhadad, 1992; Reiter et al., 1995; Dalianis, 1996; Somerset al., 1997)). On the other hand, message generating systems that provide speech of a natural quality (e.g. announcement systems, phone banking and voice mail applications) often combine fixed pieces of pre-recorded speech. These text and message generating systems are either resource intensive (powerful CPU, large storage and memory capacity, ...) or provide only limited flexibility, which seriously hampers their integration in a dialogue system. The Message-to-Speech (MTS) system described below is specifically designed to function in an environment with seriously restrained computational resources where it is impossible to store large amounts of pre-recorded speech. In this context, Text-toSpeech (TTS) is an evident alternative. However, for dialogue systems using a predefined set of message types, the use of special purpose prosody models can lead to a prosodic quality that is superior to the one generated by TTS systems, which apply general purpose prosody models for unrestricted text (see also (Hovy, 1995, p.161)). Our prosody transplantation tool (see section 2) exploits this idea: for the fixed parts of a message it allows to overrule prosody generated by general models, as is done by TTS, with specific prosody copied from natural speech. Prosody by general model is only used for those parts of the message where flexibility is needed. The MTS system combines transplanted prosody with prosody by model in order to cope with partly variable messages while still preserving natural prosody (Van Coile et al., 1995). Details on the MTS system will be provided in the third section. It consists of two components: the MTS generation and the MTS prosodic integration parts. The former module (see section 3.1) is template driven (canned "text" interspersed with slots). For a discussion of template driven systems see (van Deemter et al., 1994; van Deemter and Odijk, 1997; Reiter, 1995). The templates account for the flexibility, including the linguistic variation, of the messages. The latter module (see section 3.2) specifically takes care of assimilation and the prosodic integration of the slot values with the rest of the template. A discussion concludes this paper (see section4). 2 P r o s o d y T r a n s p l a n t a t i o n The idea behind Prosody Transplantation is that of copying intonation and duration values from a recorded donor message (human speech) to the phonetic transcription of the same message. The specific Enriched Phonetic Transcription (EPT) obtained in this manner can be fed to a TTS system whereby the normal linguistic and prosodic modules (based on general models) are by-passed (Phonetics-to-Speech PTS). Only the segmental synthesis and the synthesiser modules are used. An example of an EPT is provided by figure 1. The first value between square brackets is the phoneme duration (in ms), optionally followed by one or more intonation breakpoints. Each breakpoint consists of a location value (in ms) relative to the beginning of the phoneme, followed by a pitch

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Lateral Communication in Holonic Multi Agent Systems

Agents, in a multi agent system, communicate with each other through the process of exchanging messages which is called dialogue. Multi agent organization is generally used to optimize agents’ communications. Holonic organization demonstrates a self-similar recursive and hierarchical structure in which each holon may include some other holons. In a holonic system, lateral communication occurs b...

متن کامل

Interpreter for Highly Portable Spoken Dialogue System

Recently the technology for speech recognition and language processing for spoken dialogue systems has been improved, and speech recognition systems and dialogue systems have been developed to the extent of practical usage. In order to become more practical, not only those fundamental techniques but also the techniques of portability and expansibility should be developed. In our previous resear...

متن کامل

Office message center - a spoken dialogue system

This paper describes the experience gained from the structuring of a spoken dialogue system and its key components during the design and development of a telephony based office message center, it integrates auto-attendant, email accessing, meeting scheduling capabilities through spoken dialogue interface. A building block dialogue toolkit has been designed based on these experiences and efficie...

متن کامل

Message-To-Speech: High Quality Speech Generation For Messaging And Dialogue Systems

In this paper, we present a Message-toSpeech (MTS) system that offers the linguistic flexibility desired for spoken dialogue and message generating systems. The use of prosody transplantation and special purpose prosody models results in highly natural prosody for the synthesised speech.

متن کامل

Modeling Lateral Communication in Holonic Multi Agent Systems

Agents, in a multi agent system, communicate with each other through the process of exchanging messages which is called dialogue. Multi agent organization is generally used to optimize agents’ communications. Holonic organization demonstrates a self-similar recursive and hierarchical structure in which each holon may include some other holons. In a holonic system, lateral communication occurs b...

متن کامل

Comparing ASR modeling methods for spoken dialogue simulation and optimal strategy learning

Speech enabled interfaces are nowadays becoming ubiquitous. The most advanced ones rely on probabilistic pattern matching systems and especially on automatic speech recognition systems. Because of their statistical nature, performances of such systems never reach one hundred percent of correct recognition results. Performances are linked to environmental noise and to intraand inter-speaker vari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997